Cloud Infrastructure & Reliability Innovator | New York, NY, USA | Remote

Westbury Partners • Full-time • Remote (New York, NY, US) • $250k - $350k / year • 1m ago

Join our client, a premier High Frequency Trading firm with global influence in their Sydney office, designed to be a high-speed APAC hub for tech-driven trading. Our client owns mission-critical systems, moves fast, and continuously delivers. Their work is valued, visible, and essential to business success.

What You’ll Do:

As a Cloud Infrastructure & Reliability Innovator, you’ll play a key role in designing, building, and maintaining the systems that power our critical applications. You'll collaborate with developers and IT teams to ensure the scalability, reliability, and performance of our infrastructure — all while using automation and observability to stay ahead of incidents.

Your Responsibilities Will Include:
• Designing and implementing scalable, highly available, and fault-tolerant systems in the cloud • Developing infrastructure as code using tools like Terraform, Ansible, or Pulumi • Building and maintaining CI/CD pipelines for seamless deployments • Monitoring application performance and proactively resolving issues • Managing incident response and postmortems to continuously improve system resilience • Automating operational tasks to eliminate toil and improve efficiency • Working cross-functionally with developers, QA, and DevOps to deliver top-tier software • Participating in on-call rotations and helping improve alerting and incident systems • Ensuring systems meet security and compliance standards

Why Join Us:
• Work on mission-critical infrastructure in a fast-paced, forward-thinking environment • Collaborate with world-class engineers passionate about reliability and automation • Opportunities for continuous learning and certification • Flexible working, including remote options • Competitive salary, bonus structure, and benefits package

About You:
• Proven experience in SRE, DevOps, or infrastructure engineering roles • Proficiency with cloud platforms (AWS, GCP, or Azure) • Strong coding/scripting skills in Python, Go, Bash, or similar • Familiarity with container orchestration (Kubernetes, Docker) • Deep understanding of monitoring and observability tools (Prometheus, Grafana, ELK, etc.) • Passion for system reliability, performance, and automation • Excellent problem-solving and communication skills

Join us in redefining how reliable, scalable, and high-performance systems are built. If you're driven by solving complex infrastructure challenges with clean, automated solutions — we want to hear from you.

SiteReliabilityEngineer #SRE #DevOps #CloudInfrastructure #InfrastructureAsCode #Kubernetes #Automation #CI_CD #Monitoring #Observability #Terraform #Python #AWS #SystemReliability #EngineeringJobs

Never Miss a New Opportunity

Subscribe and get the latest jobs directly to your inbox

Get a

Northern Trust • Full-time • New York, NY, US • 3d ago

Data Science

3d ago

Apply